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Abstract 

Meta-analysis combines results from multiple studies aiming to increase 
power in finding their common effect. It would typically reject the null hy¬ 
pothesis of no effect if any one of the studies shows strong significance. The 
partial conjunction null hypothesis is rejected only when at least r of n com¬ 
ponent hypotheses are non-null with r = 1 corresponding to a usual meta¬ 
analysis. Compared with meta-analysis, it can encourage replicable findings 
across studies. A by-product of it when applied to different r values is a con¬ 
fidence interval of r quantifying the proportion of non-null studies. Benjamini 
and Heller (2008) provided a valid test for the partial conjunction null by ig¬ 
noring the r — 1 smallest p-values and applying a valid meta-analysis p-value 
to the remaining n — r + 1 p-values. We provide sufficient and necessary con¬ 
ditions of admissible combined p-value for the partial conjunction hypothesis 
among monotone tests. Non-monotone tests always dominate monotone tests 
but are usually too unreasonable to be used in practice. Based on these find¬ 
ings, we propose a generalized form of Benjamini and Heller’s test which allows 
usage of various types of meta-analysis p-values, and apply our method to an 
example in assessing replicable benefit of new anticoagulants across subgroups 
of patients for stroke prevention. 


1 Introduction 

When a null hypothesis is tested in n different settings, a meta-analysis can be used 
to obtain a combined p-value based on all of the test results. It gains power as 
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Table 1; four hypothetical cases for hve ordered p-values. 


Case 

P(i) 

P(2) 

P(3) 

P(4) 

P(5) 

A 

o 

1 

to 

o 

o 

0.4 

0.5 

0.6 

0.7 

B 

10-10 

10-9 

o 

1 

00 

o 

1 

10-6 

C 

lO-ioo 

lO-ioo 

lO-ioo 

0.049 

0.8 

D 

0.048 

0.048 

0.048 

0.048 

0.8 


the combined p-value is usually more signihcant than any of the individual p-value 
in each setting. However, the combined p-value in meta-analysis is only valid for 
the global null where the null hypothesis is true in every setting, thus it is possible 
that the null is then rejected largely on the basis of just one extremely signihcant 
component hypothesis test. Such a rejection may be undesirable as it could arise 
from some irreproducible property of the setting in which that one component test 
was made. 

Refering to Table 1, cases A and B illustrate the potential problem of meta¬ 
analysis. Both a Fisher and a Stouffer meta-analysis would hnd case A more sig¬ 
nihcant than case B, while the only signihcant setting in case A may largely due 
to a technical or statistical bias. The random ehect model in meta-analysis has 
been widely accepted for consideration of heteogeneity across studies (Higgins et ah, 
2009). However, it still assumes that the ehects across studies are similar and does 
not explicitly guarantee replication nor being robust to extreme bias. 

Researchers in functional magnetic resonance imaging (fMRI) have adopted con¬ 
junction (logical ‘and’) testing (Price and Friston, 1997; Friston et ah, 1999; Nichols 
et ah, 2005) in which a hypothesis must be rejected in all n settings where it is tested. 
The n settings may correspond to related tasks or they may correspond to indepen¬ 
dent subjects. For example in Table 1, a conjunction test would only reject case B 
which shows consistent replication. However, conjunction tests lose power for large 
n as they are based on the largest of n p-values. A compromise is to require evidence 
that at least r out of n null hypotheses are false, for some user specihed r. Such tests 
of the ‘partial conjunction (PC) null hypothesis’ were used in Friston et al. (2005) 
and then studied by Benjamini and Heller (2008). The extremes r = 1 and r = n 
correspond to the usual meta-analysis tests and conjunction testing respectively. 

Partial conjunction testing is useful in areas beyond neuroimaging. It is the tool 
for replicability analysis, which hnds effects that present in more than one studies, 
and has been applied in systematic review for healthcare prevention (Shenhav et ah, 
2015) and genome-wide association studies (Heller et ah, 2014). PC test also has 
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the potential usage in finding common gene regulation patterns across tissues for 
eQTL data (Flutre et ah, 2013). Besides, it can be applied to gene set enrichment 
analysis to avoid selection of gene sets whose significance depend on only one single 
gene (Wang et ah, 2010). 

A Benjamin!-Heller partial conjunction (BHPC) test works as follows. One sorts 
the observed p-values yielding p(i) < p( 2 ) < ■ ■ ■ < P(n), ignores the smallest r — 1 of 
them, and then applies a valid p-value combination rule to the remaining n — r -|- 
1 p-values. Benjamin! and Heller (2008) show that BHPC tests are valid for the 
partial conjunction null when the n hypotheses are independent. They also consider 
some dependent test conditions as well as the consequences of using PC tests in the 
Benjamin!-Hochberg procedure. 

Cases C and D illustrate an interesting property of the BHPC tests. Suppose 
that we need to reject at least four null hypotheses to have a meaningful finding. 
Then a BHPC test finds that case D is stronger evidence (smaller p-value) than 
case C, because BHPC is based only on p( 4 ) and p( 5 ). In case C we are extremely 
confident of three rejections and are banking on the fourth one to be correct. In case 
D by contrast, none of the four smallest p-values is much better than borderline. It 
appears to have about four times as many ways to disappointing us. This comparsion 
between case C and D reveals a counter-intuitive property of the BHPC tests, that 
we study further. 

Here we investigate the power properties of BHPC tests focussing on admissibil¬ 
ity. Under the assumption that the component p-values are either independent or 
have a positive dependency structure, we characterize the complete class of tests for 
monotone admissibility, which is a generalized form of BHPC p-values (GBHPC p- 
values). the only admissible PC tests among monotone tests are either of the BHPC 
form, or its generalization (GBHPC), which uses combined p-values constructed by 
taking the maximum of the meta-analysis p-value of each of the subsets of 

n — r -|- 1 hypotheses. Under mild assumptions, a sufficient condition for the mono¬ 
tone admissibility of a GBHPC p-value is that each of the meta-analysis p-value for 
( ”,) subsets is admissible. GBHPC p-values are also called r-values in Shenhav 
et ah (2015). 

The monotonicity condition, which means that the combined p-value is a non¬ 
decreasing function of the individual p-values, is necessary for us to discuss admissi¬ 
bility for partial conjunction hypotheses with r > 1. If we relax this condition, then 
BHPC tests become inadmissible. Because non-monotone tests are quite unreason¬ 
able scientifically in most cases, this is not a strong criticism of BHPC. We side with 
Perlman and Wu (1999) in rejecting the admissibility criterion not the test, when 
methods lacking face-value validity are included in comparisons. 
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The admissibility properties show supreme of BHPC p-values, but how can we 
then explain its puzzling behavior alluded to for cases C and D in Table 1? An 
explanation is that unlike the combined p-values in meta-analysis, the PC p-values 
measure the strength of replicability (the true proportion of non-null studies) instead 
of the magnitude of effect size, a PC p-value can be much smaller when the true 
number (rg) rather than the effect size of non-null studies is large. Compared with 
case D, the three extreme p-values of case C in Table 1 gain us much stronger evidence 
of a large effect size but not much more evidence on tq > 4. Thus, the PC p-values 
for both case C and case D are similar. 

Section 2 presents our notation and some background on partial conjunction 
tests and admissibility. Section 3 proposes the GBHPC p-values and presents the 
main theorems on monotone admissible partial conjunction p-values. Section 4 is a 
simulation study for the power comparison of several GBHPC p-values under various 
hypotheses conhgurations. Compared with BHPC p-values, GBHPC p-values have 
the advantage that it can be constructed from more sophisticated meta-analysis p- 
values. We illustrate this beneht in Section 5 in an application of GBHPC p-values 
for assessing replicable beneht and safety concerns of new oral anticoagulants across 
snbgronps of patients for stroke prevention. Section 6 has our conclusions and states 
some futnre work. 


2 Preliminaries 

2.1 Definitions and notations 

The problem begins with n nnll hypotheses to test, iPo* for i = 1,..., n. Each ilo* is 
the hypothesis to test for an individnal setting/study. The corresponding alternative 
hypotheses are IIu. The Pth hypothesis refers to a parameter 6i. If iPo* holds 
then 6i G ©oi, while Hu specihes that 6i G ©i*. The parameter space for the i’th 
hypothesis is ©j = ©oi U ©ij and of course ©o* H ©i* = 0. The parameter space of 
(6'i,... ,6'„) is © = Hi ©n 

To each hypothesis, there corresponds a p-value, p*. there may be a loss of 
information in reducing a data set to one p-value. Yet often that loss is small and 
very commonly the researchers who gathered the original data share only their p- 
valnes for reasons that may include privacy of their subjects. 

We use Pi to denote the numerical value of the p-value for the Pth hypothesis. It 
is the observed value of a corresponding random variable P*. The sorted p-values are 
P{i) < P{ 2 ) < ■ ■ ■ < P(n) and P(i) < P( 2 ) < ■ ■ ■ < P(n) are the sorted random variables. 
Probability and expectation for fnnctions of P* are given by Fg. and respectively. 
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We let 9 = {9i,...,9n) and P = (Pi,...,P„). Probability and expectation for 
functions of P are given by and Each pi is a valid P-value according to the 
dehnition below: 

Definition 1 (Validity). A valid component p-value satishes sup 0 ,g 0 ^, < ct) < 

q; for 0 < a < 1 . 

Besides independency, positive dependency can be a common dependency struc¬ 
ture across studies, especially when they share samples. For Pi,..., Pn, we assume 
that they are either independent or positively dependent (PRDS) (Benjamin! and 
Yekutieli, 2001) Under any parameter 6 E Q. Using the dehnition of PRDS thus 
they have the following property: for V(pi, • • ■ ,pn) E [0,1]*^ and \/9 E 0, 

IPe(^l ,Pn<Pn) > JJP0,(Pi < Pi) (1) 

i 

For a given r < n, the PC null hypothesis and alternative hypothesis are dehned 
as 

: {at most r — 1 hypotheses are non-null}, and 
: (at least r hypotheses are non-null}. 

The null space is dehned as = ( 6 * G 0 : Pg^” is true}. 

We use l:r to denote (1, 2,..., r} and similarly (r -|- l):n = (r -f-1, r + 2,..., n}. 
The index set u C l:n has cardinality |m| and complement —u = l:n \ u. Under the 
null hypothesis Pq^ we have Oj E 0oj for all j E u. The null space of is denoted 
as O . 

Sometimes we combine points x E MP and y E MP into a point z G M” with 
Zj = Xj for j E u and Zj = yj for j ^ u. Such a hybrid point is denoted z = Xu.y-u- 
Let u = {ii,i 2 , ■ ■ ■ ,ik}, then 9u is dehned as the combination {Oi^^Oi^, ... ,6i^). 

We can extend the dehnition of validity to meta-analysis and PC p-values. The 
combination of k p-values {k may diher from n later) produces the combined p-value 
Pr/k = fr,k{Pi-, • • •, Pk) which is a valid p-value for testing Pg'^^ if 

sup FelPr/k < a) < a, VO < a < 1. 
eee);/" 

Definition 2 (Sensitivity). A sensitive p-value P^/k = fr,k{Pii ■ ■ ■ ■, Pk) for Pg^” sat¬ 
ishes 

lim inf P^/^. = 0 

for Vm C I'.k and \u\ = r. 
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Sensitivity requires that PC p-value drops to 0 when we are certain to reject any 
of r individual hypotheses. For meta-analysis p-value Pi/k, it means that we reject 
the global null when we are certain at any of the individual hypothesis. We think 
that it is a practically reasonable requirement for a p-value for testing . 

Here are some examples of valid and sensitive meta-analysis p-values given valid 
p-values pi, ... ,pk. The combination for a method M is dehned in terms of a function 
fM,k which may incorporate sorting of its arguments. 

Example 1. Simes’ method: 

PS,k = fs,k{Pl, ■ ■ ■ ,Pk) = 

Example 2. Fisher’s method: 

PF,k = fF,k{Pl, . . . , Pfc) = P f X(2fc) log Pi 

i=l 

Example 3. Weighted Stouffer test: Consider test statistics Tj ~ 1), 

with sample sizes n* for i = \,... ,k and known cij > 0. The p-value for the null that 
6i = Q versus the alternative that 6 ^* > 0 is p* = 1 — <h(Tj) = $(—Tj). A weighted 
Stouffer p-value for takes the form 

In fact, pws,fc can be used beyond one-sided tests. For example, for two-sided test 
— Pi) also has the magnitude roughly proportional to ydlj". We shall illustrate 
the performance under such usage in our simulations in Section 4. 

Example 4. Truncated product method (Zaykin et ah, 2002): this is a more recently 
developed method to gain efficiency in the presence of outliers. The test statistic has 
the form 

i 

where 7 is some pre-determined value. The TPM p-value for takes the form 

PTPM.A: = P(IF < W^). 

where the probability function of W was computed for both independency and de¬ 
pendent scenarios in (Zaykin et ah, 2002). 


Pws,fc — Pws,k{Pi, ■ ■ ■ ,Pk) = 1 — ^ 


f Ell ^ 


mm 
2 = 1 ,••• 


I kpji) 
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For non-symmetric meta-analysis p-values, there are also weighted versions of 
Fisher tests. Note that each of the functions / in the previous examples is monotone 
according to this dehnition: 

Definition 3. (Monotonicity) the p-value f{pi,... ,Pk) is monotone if the function 
/ is non-decreasing in each argument. The set of such monotone p-value functions is 
denoted J^mon- a monotone test is one that rejects its null hypothesis for small values 
of a monotone p-value. 

A non-monotone test would reject its null hypothesis at some input (pi,... ,Pk) 
but fail to reject at some (pj,...,p'^) with all p[ < Pi. Such a test is typically 
unreasonable. 

Besides monotonicity, we also clearify the dehnition of a symmetric combined 
p-value: 

Definition 4. (Symmetry) The combined p-value f{pi,- ■ ■ ,Pk) is symmetric if its 
value stays unchanged under any permutation of {pi,... ,Pk}- 

Finally, we state the concept of admissibility using the dehnition of admissible 
tests from Lehmann and Romano (2006, Chapter 6.7). Let a hypothesis test of 
Hq versus Hi described by a function (p{X) G {0,1} of the data X. If (p{X) = 1 
then Hf) is rejected and p{X) = 0 otherwise. The test (p is valid at level a if 
sup 0 g 0 Q E 0 ( 99 (X)) < a. In our context, the data are a vector P = {Pi,... ,Pn) of 
p-values and p{Pi, ..., Pn) = l/(Pi,...,p„)<a where / is a p-value combination function. 

Definition 5 (\h,Q;-admissibihty). The level-o; test 99 G T is a-admissible for testing 
Hq : 9 E Qq against Hi : 9 E Qi if for any other level-o; test G T 

^ e {' p ') > Eg (99), for all 6 * G ©1 
implies Eg (99') = E 0 ((/)) for all 6* G ©i. 

The dehnition of admissibility depends on the alternatives in ©1 as well as the 
space T of test functions. The constraints on ©1 are important, for the ordinary 
meta-analysis, Birnbaum (1954) shows that every monotone p-value is admissible 
when the component p-values are independent and the null hypothesis is simple, 
because there is then some alternative at which that p-value gives optimal power. 
However, those optimizing alternatives may not all be reasonable. Birnbaum (1955) 
and Stein (1956) (generalized later by Matthes and Truax (1967) to include nuisance 
parameters) also showed that for the ordinary meta-analysis, when the test statistic 
distribution is an exponential family with 9 as canonical parameter, a necessary and 
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sufficient condition for admissibility is to have a closed convex acceptance region of 
underlying test statistics. 

For the space of test functions 'h, traditionally it contains all possible functions 
when considering admissibility. However, for the partial conjunction null hypothesis, 
we restrict 'F to only include tests using monotone p-values to avoid unreasonble 
more powerful tests (see Section 3.2 for details). 

2.2 BHPC p-values 

Now we restate Theorem 1 of Benjamini and Heller (2008). 

Theorem 2.1. LetPi,...,Pn be independent valid p-values, and for k = n — r+ 1 let 
fM,k{Pi, ■ ■ ■, Pk) be a valid and symmetrie meta-analysis p-value where fM,k G .^mon- 
Then Pr/n = fM,n-r+i{P{r), -P(r+i), • • •, P{n)) is o Valid p-value for PfJ^. 

As mentioned, we call the combined p-value Pr/n described in 2.1 a BHPC p- 
value for short. In practice it makes sense to require that the p-value combination 
function fM,k{')i for k = n —r-|-l, be a sensitive one for PIq^^■ Notice that if fM,k were 
a partial conjnnction test of Hq for s > 1 , then fM,k is still a valid meta-analysis 
p-valne bnt is not sensitive any more. The BHPC p-values satisfy the chain rule in 
the sense that Pr/n in Theorem 2.1 now becomes a valid test for Thongh 

Pr/n would still be a valid test of it is less efficient. 

Based on the relationship between hypotheses testing and building conhdence 
set, if we have for each r = 1 , 2 ,... ,n a valid combined p-valne Pr/n for then 

the 1 — a conhdence set for r is 


i = {r : Pr/n < a} 

Benjamini and Heller (2008) showed that if each Pr/n is a BHPC p-value with fM,k = 
fs,k in 1 , then we have pi/n < P 2 /n < ■ ■ ■ < Pn/n and i = [f,n] becomes a 1 — a 
conhdence interval with f = max{r ; Pr/n < «}. 


3 GBHPC p-values 

Motivated by the BHPC p-value, we discuss a more general class of combined p-values 
with good power properties. These are dehned as GBHPC p-values: 



Definition 6 (GBHPC p-value). For each u C l:n with |m| = /c = n — r + 1 let 
Qu be a function from [ 0 , 1 ]^ to [ 0 , 1 ] such that Qu is non-decreasing and is a valid 
meta-analysis p-value for Hqu- Then 

/*(P) = ■ ,Pn) = max gu{,Pu) ( 2 ) 

u(Zi'-n 
\u\=n — r-\-l 


is a generalized BHPC (GBHPC) p-value. 

The alternative hypothesis that at least r out of n hypotheses are false is 
equivalent to the statement that for every n—r+1 of the n hypotheses, there is at least 
one of them that is false. This is the reason that the GBHPC p-value is the maximum 
of all the meta-analysis p-values of size n — r -|- 1. Using above explanation, the next 
proposition states validity of GBHPC p-values under any dependency structure of 
the individual p-values. 

Proposition 3.1. Any GBHPC p-value is a valid p-value for Hq^'^ . 

Proof. Consider a GBHPC p-value of the form (2). From the definition of Hq^^, for 
all 6 G 00 ^"", there exists u with |n| = n — r -|- 1 such that dj G ©oj for all j G u. then 
for any a G [0,1], 

Fe{f*{P) <a) < Fe{9u{Pu) < «) = Fe^{gu{Pu) <a) < a. 

Thus f*{Pi, • ■ ■ , Pn) is valid for □ 

The BHPC p-value is a special case of GBHPC p-values. If all gu = g iri (2) 
and g a monotone and symmetric combined p-value, then f*{p) = g{p(r), ■ ■ ■ ,Pin)) 
becomes a BHPC p-value. Some meta-analysis methods, such as the weighted Stouf- 
fer test in Example 3, treat their component p-values differently depending on the 
relative sample sizes on which they are based. The GBHPC framework includes such 
methods. 

Next we show that a sensitive GBHPC p-value has a unique representation in the 
form of ( 2 ). 

Proposition 3.2. If a GBHPC p-value f* is sensitive, then eaeh Pu is also sensitive 
and the representation of f* in (2) is unique with 

gu{Pu)= inf^ r(Pi,---,P„) (3) 
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Proof. Consider any given u G l:n with |m| = n — r + 1. Then for Vi G u, let 
Ui = {—M,i}, then \ui\ = r. As /* is sensitive, we have 

liminiguiPu) = \iminf gu{Pu) < liniinf/*(P) = 0 

Pi ^0 P'^i 

Thus, gu is sensitive for every u with \u\= n — r + 1. On the other hand, for a given 
Mo with I Mol = n — r + 1 , using dehnition ( 2 ), we have 


inf 

uo6(0,l]^ 


/*(P) = inf 


max5(„( 

U 


9uo {Pi 




this also proves the uniqueness of the representation. □ 

Remark 3.1. Equation (3) also shows that any symmetric GBHPC p-value is a BHPC 
p-value. If /* is symmetric, then gu constructed in (3) is also valid, monotone and 
symmetric. Also, gu = g ni the same for all u. Thus /* = g{p(n), ■ ■ ■ ,P{r)) is a BHPC 
p-value. 

Remark 3.2. Equation (2) involves taking the maximum of meta-analysis p-values 
over number of subsets. Thus, the computational cost of non-symmetric 

GBHPC p-values can be high for large r and n. One situation to use a non-symmetric 
GBHPC p-value is when r = 2, then = n which is typically acceptable. Be¬ 

sides, it is sometimes possible to make use of the special structure of the problem 
to construct easy-to-compute non-symmetric GBHPC p-values. For instance in Sec¬ 
tion 5, the GBHPC p-values we constructed for the pharmaceutical data has almost 
the same computational cost as BHPC p-values, but can give much smaller p-values. 


3.1 Monotone o-admissiblity 

Now we discuss the sufficient and necessary conditions for admissible combined p- 
values of a PC hypothesis. Each of our results uses some combination of the following 
three assumptions on the individual p-values. 

Assumption 1 (Strong alternatives). Vo > 0 and i = 1,... ,n, sup 0 .g 0 ^. W^ 0 ^{Pi < 
a) = 1. 

Assumption 2 (Continuity). For V 6 *j G ©li and i = 1,... ,n, P 0 j(T’j = 0) = 0. 

Assumption 3 (Completeness). The family : 9u ^ ©om} for any subset u with 

|m| = n — r -|- 1 is complete. 
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Assumption 1 states that for each individual hypothesis there are strong enough 
alternatives that we can almost certainly reject the null. Assumption 2 is a technical 
assumption assuming that the probability that the p-value is exactly 0 is zero under 
any alternative. The completeness in Assumption 3 is to guarantee that if two level 
a meta-analysis tests for Hqu has the same power at every point in the alternative 
space then they are the same test. Roughly speaking, both Assumptions 1 and 3 
require that the alternative space of each individual hypothesis is large enough to 
include various possibilities. The three assumptions can be satished in common tests. 

For example, tests satisfying Assumption 1 include testing the parameters of 
exponential families and location families. Lehmann and Romano (2006, Theorem 
4.3.1) show that completeness is satished for testing the natural parameter of a k- 
dimensional exponential family if the alternative space ©ij contains a /c- dimensional 
rectangle. If the individual p-values are independent, then completeness of the alter¬ 
native space for each individual hypothesis implies Assumption 3. Thus, we believe 
that assumptions 1 to 3 can cover a large class of problems. 

Theorem 3.3 shows that GBHPC p-values form a complete class of monotone 
a-admissible p-values for . Theorem 3.4 states that a sufficient condition for a 
sensitive GBHPC p-value to be monotone a-admissible is that each Qu is admissible 
for Hqu- 

Theorem 3.3. Let Pi,...,P„ he independent or positively dependent (PROS) P- 
values satisfying assumptions 1 and 2. Let Pr/n be a valid monotone p-value for 
Then there exists a valid GBHPC p-value p*^^ that is uniformly at least as 
powerful as Pr/n- 

Theorem 3.4. Let Pi, ... ,Pn be PROS P-values satisfying assumptions 1 to 3. For 
a sensitive GBHPC p-value Pfj^ = f*{P) of the form (2), a sufficient condition 
for Pfi^ to he monotone a-admissihle is that each Qu is an admissible meta-analysis 
p-value for Hqu- 

We introduce the following lemma, which is the key reason that 3.3 and 3.4 
hold. It shows that given a valid monotone p-value that is not of the GBHPC form, 
we can expand its rejection region while retaining its validity. 

Lemma 3.5. Let Pi, ... ,Pn be PROS P-values satisfying assumption 1. Let /(Pi, • ■ ■ , Pn) 
he a valid monotone p-value for H^^"' and for u C l:n with |u| = n — r 1, define 


9u{,Pu) 


inf /(Pi,---,P„ 


Then Qu is a valid monotone meta-analysis p-value for Hq^- 


( 4 ) 
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Proof. Monotonicity of / implies monotonicity and measurability of Qu- Next, sup¬ 
pose that Qu is not valid for Hqu- Then there is an a G [0,1] and a 0* with 6j G ©oj 
for all j e u such that F 0 *{gu{Pu) < a) = Pe* (infp_^g(o,i]r-i /(P) < a) > a + e 
for some e > 0. From the monotonity of /, there is some hxed p G (0,1] with 
< a) > a + e for any G [0,^^ Since the p-values are PRDS 
and {p : f{Pu,P-u) < a} is a decreasing set for any hxed P-u, we have 

Pe {f{P) < a I P-n e >a + e/2 

for any 9 satisfying 9u = 0*. Using Assumption 1, there also exists with 9* G ©ij 
for Vj G —u such that Pd*{Pj < ^ > ((a + el2)/{a + Since the p-values 

are PRDS, using (1) we have 

< «) > j(/(-P) < €-«)>« + e /2 

contradicting the validity of f{P). □ 

Now we are ready to prove Theorems 3.3 and 3.4. 

Proof of 3.3. Let gu{Pu) be dehned in (4). Then Pr/n > Pf/^ when P G (0,1]”. 
Using Assumption 2, Pf^^ is then uniformly at least as powerful as Pr/n- It then 
follows directly from Lemmas 3.1 and 3.5 that is a valid GBHPC p-value. □ 

Using Lemma 3.1, to prove 3.4, we only need to prove the monotone o-admissibility 
of PL . 

r/n 

Proof of 3.4. To prove the monotone a-admissibility of /*(Pi,-- - ,Pn), suppose 
that there is a valid monotone test /** satisfying P 0 (/**(P) < a) > P 6 )(/*(P) < a) 
for all 9 G By 3.3 we can assume that /** is a GBHPC p-value: 

/**(P) = inax g'^iPu), 

u<Zl:n 
\u\=n — r-\-l 

where g'.^ is a valid monotone meta-analysis p-value. Notice that since /* is sensitive, 
equation (3) holds. We now show that for each u C l:n with |m| = n — r -|- 1, and 
any 9u ^ ©on, 

P„, ( ip r(P) <a)< < a) = ,3' (5) 

using a similar strategy as in the proof of 3.5. If (5) does not hold for some set 
u and a corresponding then there exist some e > 0 and p G (0,1] such that 
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< a) > (3' + e for any p_u G (0,^^ Using Assumption 1, there 
exists e* with G 0y for j G -u such that Fe*{Pj < ^ > ((/3'+e/2)/(/3' + e))^^^"“^\ 
Thus, 

P(e..fljr(P) < a) > P(K-Mljr{P) <a,Pi< P.Vj 6 -u) >/}'+ £/2 

> < a) > P(0.:«lJf"(P) < a) 

which violates the assumption that /** is uniformly at least as powerful as /*. Thus, 
(5) holds, equation (3) and (5) implies that < ct) > ^eu{.9u{,Pu) < «) for any 

6 u ^ ©On and any a G [0,1]. As gu{Pu) is a-admissible for Hqu, we have F 0 ^{g'^{Pu) < 
a) = FQ^{gu{Pu) < «)• Further, using Assumption 3 we have g'u{,Pu) = gu{Pu) a.e.. 
Thus, for all 6 G P 6 i(/**(P) < a) = P 0 (/*(P) < a) which shows that /* is 

monotone a-admissible for □ 

Remark 3.3. As mentioned in 3.1, the BHPC p-values are symmetric GBHPC p- 
values As a consequence, the BHPC p-values characterize the form of symmetric 
monotone admissible combined p-values. 

Combining 3.4 with results of Birnbaum (1955) and Lehmann and Romano (2006, 
Theorem 6.7.1) who characterized admissible tests for the global null in exponential 
families, we can give more specihc conditions when 3.4 is applied to exponential 
families, here is the result for a simple senario. 

Example 5. Suppose that independent test statistics Tj for i = 1,..., n are available 
on hypotheses Hqi : 6i = ai E against Pq* • Si G Here we assume that 

every Ti is the sufficient statistics for an exponential family with natural parameter 
0j. For a sensitive CBHPC p-value f*{p), suppose that Fg^^^guiPu) < a) = a. 
Also, for Va G [0,1] the set of for which guiPu) > « (the acceptance region) is 
a closed and convex set, except for a subset of measure 0. Then f*{p) is monotone 
a-admissible for 

Related work on convexity and admissibility also appears in Matthes and Truax 
(1967) for testing parameters of exponential families with presence of nuisance pa¬ 
rameters, Marden et al. (1982) and Brown and Marden (1989) for generalization to 
distribution families beyond exponential families, and Owen (2009) for tests power¬ 
ful against alternatives with concordant signs. Notice that the n-dimensional set of 
test statistics T itself for which /*(p) > a is not convex. For partial conjunctions, 
the null hypothesis for the parameter usually includes all of the coordinate axes and 
the smallest convex set containing the axes is all of Euclidean space. As a result 
convexity of the acceptance region is not appropriate to partial conjunction testing. 
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3.2 Inadmissibility 

In Section 3.1 we constructed monotone a-admissible p-values for we we show 

that they fail to be admissible if we allow non-monotone tests. For the case n = r = 2, 
the construction of such counter-examples dates back to Lehmann (1952) and Iwasa 
(1991). 

Here we demonstrate that if we don’t require monotone tests then a BPHC test is 
inadmissible. Let n = r = 2. If both Pi and P 2 are a-admissible, then using 3.3 and 
3.4, the constructed combined p-value is just P( 2 ), which is monotone admissible. At 
a given a, the critical function is 99 = lpp^<a(pi,p 2 )- 

Now we can easily construct a more powerful a-level test, by adding to the original 
rejection region a square around the top-right corner in the p-value space (Solid 
shaded regions in Figure 1). Dehne the set 




{(Pi,P 2 ) I P(i) > 1 - a}, if a < ^ 

{{PuP2) \P{i)>a}, if a > |. 


Then the test ip' with critical function ip'{P) = <p{P) + l(Pi,P 2 )es is uniformly and 
strictly more powerful than ip. To prove that ip' is an a-level test, we note that 
S n {p( 2 ) < a} = 0. Therefore E,{ip'{P) \ Pi = Po) < a holds for any po ^ [0, !]• 
Similarly, K{ip'{P) \ P -2 = po) < ct- Since po is arbitrary we conclude that ip' is an a- 
level test. Actually, as shown in Figure 1, we can further expand the rejection region 
of ip' to include also the dotted shaded regions and to get an even more powerful but 
still valid test The rejection region of ^ in the p-values space consists of small 
squares along the diagonal line. 

If the test statistics are Zi ~ A/'(pi, 1) and Z 2 ~ A/'(/i 2 ,1), and Pi and P 2 are 
two-sided tests for the mean pi and p 2 respectively, then the top two plots of Figure 1 
show the rejection region of ip' and p at level a = 0.1 in the p-value space and in 
the test statistic space. The bottom two plots compare the power of p and ^ as a 
function of (pi,p 2 )- They show that the power gain of the non-monotone p only 
appears in the low power region where the power is below or near a. 

The more powerful test p' increases power by strangely rejecting when both 
input p-values are large enough. We now use this same approach to show that 
without the monotonicity constraint, any GBHPC p-value is inadmissible for any n 
and any r G 2:n. The counter-examples reject Pg^"^ when all p-values are large. The 
idea is to show that for any GBHPC test, it’s always possible to add a “box”-shaped 
rejection region like the square around the origin in the right panel of Figure 1 while 
still keeping the test valid. The point is not to advocate for such tests, but rather 
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to reinforce the idea that admissibility is only a useful concept within a well chosen 
class of functions. 

We need the following mild technical constraint to guarantee that the “box” we 
choose can really increase power at least in one alternative hypothesis. 

Assumption 4. For each i G l:n, there exists 6^ G ©oi that Fgo[Pi < a) = 
supe^gOij. < a) for Vo G [0,1]. Let 9^ = {9^,92,-■■ Then for any set A, 

if Peo(A) > 0, then there exists 9 ^ G that P 0 i(A) > 0. 

Theorem 3.6. Let Pi,...,Pn be independent p-values satisfying assumptions 1, 2, 
and 4- Let 1 <r <n and a G (0,1). Then any monotone a-admissible combined 
p-value for testing is not a-admissible without the monotonicity constraint. 

Proof. Using 3.3, we only need to consider a GBHPC p-value /* which is dehned in 
6 . Let 9^ = {9^, ^ 2 ) ■ ■ ■ ) ^n) be the parameter in Assumption 4. dehne 

R = {pe[o,ir-.r{p)<a}= fi K 

u(Zl-n 
\u\=n — r-\-l 

where = {p G [0,1]*^ : QuiPu) < a} and Qu is dehned in (2). 

First, as P 6 )o(/* < a) < a < 1 and /* is non-decreasing, there exists some po < I 
such that if pj > po for all j G l:n then f*{p) > a. 

Then, we show that there must exist a set u* with P 0 o(i?„*ni?'^) = e > 0, where 
is the complement set of R. If this doesn’t hold, then it means that for any u C l:n 
with |m| = n — r -|- 1, the equation 1 /*(p)<q(p) = ^guipu)<a{p) n-e. Pgo holds. This 
implies that lp<a doesn’t depend on except for a zero probability set under P^o. 
As U ucv.-n —u = l:n, we get that !/*<« doesn’t depend on any pj except for a zero 

|u|=n —r+1 

probability set under P^o, which implies that 1 /*<q = 1 or 0 a.e. P^o. It’s obvious 
that such a test is either invalid or trivially not admissible, which contradicts our 
assumptions. 

As a consequence, we have Fgo^f* < a) = Fg^^Ru*) — e < a — e. Notice that 
P 0 o(/* < a) = p 0 o ^ I L^-u]) foi’ any u. using the fact that /* is non¬ 

decreasing, Fgolf* < a I P-u = P-u] is non-increasing in p_„. Thus there exists 
p < 1, such that for any u, if G [p, 1]”“^, then 

Peg)/* < a I P_u = P-u] <a-e. 

Let p* = max(po,P, 1 — S = Ai{p G [0,1]” : Pi > p*}. Then we 

construct a new test with critical function ip\ ip = !/*<« + l^. 
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As {p G [0, !]"■ : /*(p) < a} n A = 0 , we know that 99 is at least as powerful as 

l/*<o- Using Assumption 4, as f’eo^S) > (1 — p*Y > 0) there exists 9^ G with 

P 0 i( 5 ') > 0. Thus, (p strictly dominates 1 /*<q, at 9^. Finally, for Vp G [0, and 
Vm C l:n with |m| = n — r + 1, if 6*^ G then 

I P-u = P-u] < Pe„[/* < a I P-u = P-u] + elp_„6[p*,i]r-i 

< IPeo[/* < a I P_u = P-u] + elp_„e[pM]’'-i ^ "• 

The second inequality above follows from Assumption 4, independence of the indi¬ 
vidual p-values and monotonicity of /*. Thus ip is still an a-level test for This 

shows that /* is not a-admissible. □ 

4 Simulation 

In this simulation example, we compare the power of several GBHPC p-values testing 
for the PC hypothesis Hq with n = 8 studies and r = 2. Compared with other r 
values, the null hypothesis is often of particular interest as it tests whether the 
signihcance of the effect can replicate or not across studies. It is also the case where 
the computational cost of non-symmetric CBHPC p-value would typically not be a 
concern. 

We consider the alternative whose true number of non-null hypotheses is one of 
tq = 2,4,6. We assume that all the individual hypotheses are independent. Each 
p-value Pi is a two-sided p-value of the corresponding z-value Z, 

i Mi^^/WiPi, 1 ) for 

i = 1,2,. .., 8 . We set three of the sample sizes W of the eight individual studies to 
100, another three of them to 500 and the last two to 1000. for the effects /i*, if FToi 
is true, then p* = 0. Otherwise when H^i is false, we generate pi Camma(Q;o; /^o)- 
We dehne po = cto//^o and do = ^/ocq /which are the mean and standard deviation 
of the non-null effect across studies. We compare the power of each CBHPC p-value 
as a function of (po, cto) at the signihcance level of a = 0.05. 

We compare three CBHPC P-values, whose meta-analysis P-values gu are from 
examples 1 to 3 respectively. The results are shown in Figure 2. For each tq and 
each form of the CBHPC p-value, we plot a power map against (p-oWo)- To better 
illustrate the difference across methods, for Simes and weighted Stouffer CBHPC 
p-values, we plot their powers after they are substracted by the powers of Fisher’s 
BHPC p-value at the corresponding location. Here are some observations from the 
simulation results. First, Figure 2 shows that when tq = 2, Simes BHPC p-value 
is the most powerful in a large region of the alternative space. The reason is that, 
for each subset of hypotheses with size n — r-|-l = n — 1, at the worst case there 
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is only one non-null individual hypothesis, where simes should be most powerful 
in detecting extreme p-values. Second, the Weighted Stouffer GBHPC p-value can 
have higher power than Fisher’s BHPC p-value when ro > r = 2 and when the effect 
heterogeneity across non-null studies (ao) is not too large to dominate the average 
effect (juo). Notice that we are using two-sided p-values to make a fair comparison of 
the methods, thus taking as weights in Stouffer’s method would not be optimal. 
However, as we have discussed in Example 3 and shown here, it can still provide 
a good test with two-sided p-value. If individual p-values were one-sided, then we 
would have seen a even higher power gain. Finally, the three methods would not 
have a noticeable difference in power when the power is too low or too high. Most 
of the difference apprear when the power is in the range of 0.4 to 0.6. 


5 Real data analysis 

Compared with BHPC p-values, CBHPC p-values has the flexiblility to make use of 
the possibly complex dependency structure across studies, thus may achieve higher 
power. We use a real data example to illustrate this beneht. 

The dataset (Ruff et ah, 2014) is a pooled dataset from four randomized clinical 
trials aiming to measure the relative benefit of new anticoagulants (NOAC) compared 
with an old drug called warfarin for stroke prevention. One primary goal in the 
original paper was to assess and compare the efficiency of these new drugs in different 
clinical subgroups of patients. The subgroups and the data are shown in Table 2a. 
The data {m/N) records that among N samples the number of samples who suffer 
a stroke or systemic embolic event is m. Thus, to test the different between using 
the old and new drug, we estimate the odds ratio and compute p-values for each 
subgroup using Fisher’s exact test. 

We can use PC tests to assess the consistency of the drug efficiency across different 
subgroups, which is one of the major interest of the original paper. One major 
difficulty for applying PC tests (and meta-analysis) to theses subgroups is that these 
subgroups are correlated. For example, the subgroup of age < 75 clearly overlaps 
with the subgroup of female. When there is unknown dependency across individual 
hypotheses, the only BHPC p-values available are to have the meta-analysis p-value 
g be either the Bonferroni or Simes p-value. The left column of Table 2b are the 
values of the Bonferroni BHPC p-values for with n = 18 and r changing 
from 2 to 18. As is increasing in r except when r change from 16 to 17, the 
values of Simes BHPC p-values will agree with except when r = 16. 

The question is: is it possible to get smaller valid p-values? The answer is 
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YES, by using a non-symmetric GBHPC p-value. Notice that the groups which 
represent different levels of the same grouping factor do not share samples, thus have 
independent p-values. For example, the three p-values for the three CHADS 2 groups 
are independent. Let 1:18 = where each J* is the index set of subgroups using 

the ith. grouping factor. Then, to build a GBHPC p-value we construct guiPu) 
for each u G l:n with |n| = n — r -|- 1 as follows: for each Jj fl u 7 ^ 0 , a Fisher’s 
p-value (Example 2) pu,i is calculated on Punu, then 

gu{Pu) = \{i : u n li 0} \ ■ (^niinp„,ij 

which is a Bonferroni combination of p-values across the grouping factors. The above 
construction obviously provides a valid GBHPC p-value. 

The next concern is the computation of We claim that for any given r, it 

can be quickly computed by checking Table 2c. Table 2c are all Pu,i values that can 
possibly affect the value of for any r. Notice that 

f*{p) = max min pu,i. 

u,\u\=n—r-\-l i 

Since Pu^i is symmetric, it can possibly influence /* only when j is the Fisher’s 
combination on the largest |p fl /j| p-values in /*. This explains the values in Ta¬ 
ble 2c. Then, computing becomes easy. One nice phenomenon of the p-values in 
Table 2c is that p-values on the hrst row are uniformly smaller than the p-values on 
the second row, the latter further uniformly smaller than p-values on the third row. 
Such a property can greatly simplify the computation. The calculation of guiPu) is by 
replacing some pp^i (first row p-value in Table 2c) with some larger pu^i (higher row 
p-values) or completely removing it. For example when r = 4, there are n —r-f 1 = 3 
indices that are not in u. Thus for any n, at most 3 of the can be replaced. If 
we denote < ■ • •P/( 8 ),( 8 ), then it’s easy to see that pf^ = 

It is a bit more complicated when r > 9. First, notice that for the r — 1 indices 
that are not in argmax„p„(p„), we should have each /jfl—n 7 ^ 0 to avoid appearance 
of p-values on the hrst row in Table 2c. Then the problem becomes optimizing the 
location of the rest r — 1 — 8 indices to maximize gu{Pu)- For example when r = 12, 
there are 4 indices left and we now just need to examine the p-valnes on the second 
and third row in Table 2c. If we denote the p-values on the second row as Pui,i and 
dehne < .. .pu^^-^,{8), using a similar argument as r = 4, one can check that 

Plyn = max{7p„p^,(2),6p„(3j,(3),5p„(^^,(4)}. 

Finally, the values for all r = 2,..., 18 are shown in Table 2b. Gompared 
with Bonferroni BHPG p-values, the new GBHPG p-values can be much smaller 
especially when r is small. Both methods give a 95% conhdence interval of the true 
proportion of non-null hypotheses Tq/ti as ro/n G [0.72,1]. 
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6 Conclusion and future work 


Partial conjunction hypotheses are natural hypotheses to test for measuring repeated 
effects across settings/studies. The null is rejected only when at least r hypotheses 
are non-null. By testing PC hypotheses at different r values, one can also construct 
a conhdence interval of ro/n, the true proportion of non-null hypotheses. 

This paper characterizes the admissible p-values for a partial conjunction test 
of independent hypotheses or hypotheses with positively dependent P-values, within 
the class of non-decreasing p-values. Any monotone admissible p-value for is the 

maximum of the non-decreasing p-values for the global null in each combination of 
n—r-|-l hypotheses, which we call GBHPC p-values. We have shown that for sensitive 
GBHPC p-values, as long as each meta-analysis p-value of the n — r -|- 1 hypotheses 
is admissible, the combined p-value is monotone admissible. A consequence is that 
among combined p-values that only depend on the order statistics of individual p- 
values, the original BHPC p-values are the only monotone admissible ones. We also 
have found inadmissibility of GBHPC p-values without the monotonicity constraint. 
However, the dominated tests only have a moderate power gain at low power regions 
in the alternative space. Since these counter-examples are not monotone, they are 
hard to be explained in practice thus not reasonable choices. 

In summary, we illustrated the properties of tests for a PC hypothesis and char¬ 
acterized a class of good tests called GBHPC p-values. Compared with its symmetric 
form, the BHPC p-values, GBHPC p-values have more flexibility to adapt to com¬ 
plicated problem structure, thus can have power gain at important regions in the 
alternative space, as we showed in our simulations and real data examples. The com¬ 
putational cost of non-symmetric GBHPC p-values can be of a concern, but there are 
special cases where GBHPC p-values are computable. One of the future directions 
is to expand the applications where computable GBHPC p-values are available. 

One other direction is to understand properties of the conhdence interval of r 
constructed by GBHPC p-values. In Section 5, the result showed that though the 
newly proposed GBHPC p-values can be much smaller than Simes or Bonferroni 
BHPC p-values at many r values, the conhdence interval that the two methods 
constructed are still the same. Finally, there are variations of partial conjunctions 
that are useful in practice. For example, the count of replicability may vary for 
diherent hypotheses. Replication of ehects from two distinct classes can be of more 
interest than replication in two similar classes. Another variation is to require that 
a null hypothesis is rejected only when there are at least r non-nulls with the same 
sign of ehect. Such hypotheses can have very complex alternative and null space, 
and it can be the future work to understand their properties. 
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Figure 1: The top two plots: rejection regions of tp' and p in the p-value space and 
the test statistic space, using a = 0.2. The sold shaded region is the rejection region 
of 9 ?', while the rejection region of p also includes in the dotted shaded squares. The 
Bottom two plots: power comparison of p and p. The dashed line is where power is 
at 0.15 and the dotted line is where the power is 0.5. 
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Figure 2: Power comparison of GBHPC p-values: Each row is for one rg value and 
each column is for one form of GBHPC p-value. the signihcance level is a = 0.05. 
The hrst column are power maps Fisher’s BHPC p-value against (/ro,o'o). the last 
two columns are the power difference of Simes and Weighted Stouffer’s GBHPC p- 
values against Fisher’s p-value. The green color indicates a power loss compared 
with Fisher’s BHPC p-value while the blue color shows a power gain. 
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(a) 



Pooled NOAC 




(events) 

(events) 

Odde Ratio ^ 


< 75 

496/18073 

578/18004 

0.85 

9.26E-03 


(b) 


> 75 

415/11188 

532/11095 

0.76 

6.61E-05 




Sex 





r 


PrM 

Female 

382/10941 

478/10839 

0.78 

5.00E-04 

2 

3.73E-04 

4.49E-05 

Male 

531/18371 

634/18390 

0.83 

2.38E-03 

3 

3.98E-04 

4.66E-05 

Diabetes 





4 

6.98E-04 

7.50E-05 

No 

622/20216 

755/20238 

0.82 

2.93E-04 

5 

7.29E-04 

1.18E-04 

Yes 

287/9096 

356/8990 

0.79 

3.81E-03 

6 

8.59E-04 

1.31E-04 

Previous stroke 

or TIA 




7 

3.52E-03 

1.39E-04 

No 

483/20699 

615/20637 

0.78 

4.65E-05 

8 

5.50E-03 

4.23E-04 

Yes 

428/8663 

495/8635 

0.85 

2.14E-02 

9 

2.38E-02 

1.90E-02 

Creatinine clearance (mL/min) 




10 

3.43E-02 

2.66E-02 

< 50 

249/5539 

311/5503 

0.79 

6.24E-03 

11 

3.75E-02 

2.81E-02 

50-80 

405/13055 

546/13155 

0.74 

5.85E-06 

12 

4.37E-02 

4.63E-02 

> 80 

256/10626 

255/10533 

1.00 

9.64E-01 

13 

5.56E-02 

6.45E-02 

CHADS2 score 





14 

8.07E-02 

6.45E-02 

0-1 

69/5058 

90/4942 

0.75 

7.83E-02 

15 

8.56E-02 

7.36E-02 

2 

247/9563 

290/9757 

0.87 

1.05E-01 

16 

2.35E-01 

7.36E-02 

3-6 

596/14690 

733/14528 

0.80 

5.21E-05 

17 

2.11E-01 

2 . 11 E -01 

VKA status 





18 

9.64E-01 

9.64E-01 

Naive 

386/13789 

513/13834 

0.75 

2.19E-05 




Experienced 

522/15514 

597/15395 

0.86 

1.61E-02 




Centre-based TTR 







< 66 

509/16219 

653/16297 

0.78 

2.49E-05 




> 66 

313/12742 

392/12904 

0.80 

4.68E-03 





(c) 



Age 

Sex 

Diabetes 

Stroke or TIA 

Creatinine 

CHADS 2 

VKA 

TTR 

-^0 

^2/ki 

^3/ki 

^0 

9.37E^06 

9.26E-03 

1.74E-05 

2.38E-03 

1.64E-05 

3.81E-03 

1.47E-05 

2.14E-02 

5.83E-06 

3.68E-02 

9.64E-01 

5.29E-05 

4.78E-02 

1.05E-01 

5.61E-06 

1.61E-02 

1.98E..06 

4.68E-03 


Table 2: (a) The original data and individual p-values: the blue color highlights 
individual p-values that are not signihcant at level a = 0.05; (b) the values of 
Bonferroni BHPC p-value and the new GBHPC p-values when r changes in 
(c) the grouping factor level combined p-values: here kt = |/j| and each p-value is 
the Fisher’s BHPC p-value on pj. testing for where ri = 1, 2, 3. 
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